AITopics | antmaze task

Collaborating Authors

antmaze task

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

11e1900e680f5fe1893a8e27362dbe2c-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 01:03:17 GMT

antmaze task, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Revisiting the Minimalist Approach to Offline Reinforcement Learning

Neural Information Processing SystemsFeb-9-2026, 03:22:05 GMT

However, the effect of these design choices on established baselines remains understudied.

arxiv preprint arxiv, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Country: North America > United States > Montana (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Appendix A Convergence of with the hybrid loss

Neural Information Processing SystemsFeb-8-2026, 00:47:56 GMT

Before presenting the formal version of Theorem 4.1 and its proof, we introduce some preliminaries. As stated in Theorem 4.1, we assume that both the discriminator class Now we are ready to present a formal version of Theorem 4.1 as follows. By the triangle inequality and Eq.A.12, we obtain λ null By Eq.A.2, Eq.A.11, and Eq.A.14, we have d In this section, we prove Proposition 3.1. In this section, we will give a brief proof of Theorem 4.2, and show that the learning policy can find Suppose the stationary point of the Bellman equation w.r.t the production sample space In this section, we will give a brief proof of Theorem 4.3, and show the convergence of the learning First, we show the monotonic improvement of Q function of the iterated policy by CPED. The Gym-MuJoCo is a commonly used benchmark for offline RL task.

antmaze task, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)

Add feedback

DRDT3: Diffusion-Refined Decision Test-Time Training Model

Huang, Xingshuai, Wu, Di, Boulet, Benoit

arXiv.org Artificial IntelligenceJan-11-2025

Decision Transformer (DT), a trajectory modeling method, has shown competitive performance compared to traditional offline reinforcement learning (RL) approaches on various classic control tasks. However, it struggles to learn optimal policies from suboptimal, reward-labeled trajectories. In this study, we explore the use of conditional generative modeling to facilitate trajectory stitching given its high-quality data generation ability. Additionally, recent advancements in Recurrent Neural Networks (RNNs) have shown their linear complexity and competitive sequence modeling performance over Transformers. We leverage the Test-Time Training (TTT) layer, an RNN that updates hidden states during testing, to model trajectories in the form of DT. We introduce a unified framework, called Diffusion-Refined Decision TTT (DRDT3), to achieve performance beyond DT models. Specifically, we propose the Decision TTT (DT3) module, which harnesses the sequence modeling strengths of both self-attention and the TTT layer to capture recent contextual information and make coarse action predictions. We further integrate DT3 with the diffusion model using a unified optimization objective. With experiments on multiple tasks of Gym and AntMaze in the D4RL benchmark, our DT3 model without diffusion refinement demonstrates improved performance over standard DT, while DRDT3 further achieves superior results compared to state-of-the-art conventional offline RL and DT-based methods.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2501.06718

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Revisiting the Minimalist Approach to Offline Reinforcement Learning

Tarasov, Denis, Kurenkov, Vladislav, Nikulin, Alexander, Kolesnikov, Sergey

arXiv.org Artificial IntelligenceOct-24-2023

Recent years have witnessed significant advancements in offline reinforcement learning (RL), resulting in the development of numerous algorithms with varying degrees of complexity. While these algorithms have led to noteworthy improvements, many incorporate seemingly minor design choices that impact their effectiveness beyond core algorithmic advances. However, the effect of these design choices on established baselines remains understudied. In this work, we aim to bridge this gap by conducting a retrospective analysis of recent works in offline RL and propose ReBRAC, a minimalistic algorithm that integrates such design elements built on top of the TD3+BC method. We evaluate ReBRAC on 51 datasets with both proprioceptive and visual state spaces using D4RL and V-D4RL benchmarks, demonstrating its state-of-the-art performance among ensemble-free methods in both offline and offline-to-online settings. To further illustrate the efficacy of these design choices, we perform a large-scale ablation study and hyperparameter sensitivity analysis on the scale of thousands of experiments.

algorithm, arxiv preprint arxiv, rebrac, (12 more...)

arXiv.org Artificial Intelligence

2305.09836

Country: North America > United States > Montana (0.04)

Genre: Research Report > New Finding (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

When Data Geometry Meets Deep Function: Generalizing Offline Reinforcement Learning

Li, Jianxiong, Zhan, Xianyuan, Xu, Haoran, Zhu, Xiangyu, Liu, Jingjing, Zhang, Ya-Qin

arXiv.org Artificial IntelligenceFeb-8-2023

In offline reinforcement learning (RL), one detrimental issue to policy learning is the error accumulation of deep Q function in out-of-distribution (OOD) areas. Unfortunately, existing offline RL methods are often over-conservative, inevitably hurting generalization performance outside data distribution. In our study, one interesting observation is that deep Q functions approximate well inside the convex hull of training data. Inspired by this, we propose a new method, DOGE (Distance-sensitive Offline RL with better GEneralization). DOGE marries dataset geometry with deep function approximators in offline RL, and enables exploitation in generalizable OOD areas rather than strictly constraining policy within data distribution. Specifically, DOGE trains a state-conditioned distance function that can be readily plugged into standard actor-critic methods as a policy constraint. Simple yet elegant, our algorithm enjoys better generalization compared to state-of-the-art methods on D4RL benchmarks. Theoretical analysis demonstrates the superiority of our approach to existing methods that are solely based on data distribution or support constraints. Offline reinforcement learning (RL) provides a new possibility to learn optimized policies from large, pre-collected datasets without any environment interaction (Levine et al., 2020). This holds great promise to solve many real-world problems when online interaction is costly or dangerous yet historical data is easily accessible (Zhan et al., 2022). However, the optimization nature of RL, as well as the need for counterfactual reasoning on unseen data under offline setting, have caused great technical challenges for designing effective offline RL algorithms. Evaluating value function outside data coverage areas can produce falsely optimistic values; without corrective information from online interaction, such estimation errors can accumulate quickly and misguide policy learning process (Van Hasselt et al., 2018; Fujimoto et al., 2018; Kumar et al., 2019). Recent model-free offline RL methods investigate this error accumulation challenge in several ways: 1) Policy Constraint: directly constraining learned policy to stay inside distribution, or with the support of dataset (Kumar et al., 2019); 2) Value Regularization: regularizing value function to assign low values at out-of-distribution (OOD) actions (Kumar et al., 2020b); 3) In-sample Learning: learning value function within data samples (Kostrikov et al., 2021b) or simply treating it as the value function of behavioral policy (Brandfonbrener et al., 2021). All three schools of methods share similar traits of being conservative and omitting evaluation on OOD data, which brings benefits of minimizing model exploitation error, but at the expense of poor generalization of learned policy in OOD regions.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2205.11027

Country: